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O [ Abstract 

O ' I show how to construct Monte Carlo algorithms (programs), prove 

c^ . that they are correct and document them. Complicated algorithms are 

build using a handful of elementary methods. This construction process 

Q^' is transparently illustrated using graphical representation in which com- 

plicated graphs consist of only several elementary building blocks. In 
particular I discuss the equivalent algorithms, that is different MC algo- 

^ ' rithms, with different arrangements of the elementary building blocks, 

which generate the same final probability distribution. I also show how 



^^ , to transform a given MC algorithm into another equivalent one and 

^ [ discuss advantages of the various "architectures" . 

G^ . To be submitted somewhere, sometime (or may be not) 
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1 Introduction 

The aim of this report is to provide: 

• Elementary description of elementary Monte Carlo (MC) methods for a 
graduate student in physics, who is supposed to learn them in a couple 
of days, 

• Methods of transparent documenting of the MC programs, 

• Reference in publications where there is no space for description of the 
elementary MC methodology. 

In my opinion there is certain gap in the published literature on the MC 
methods. The elementary MC methods like rejection according to weight, 
branching (multichannel method) or mapping of variables are so simple and 
intuitive that it seems to be not worth to write anything on them. On the other 
hand in the practical MC applications these methods are often combined in 
such a complicated and baroque way that sometimes one may wonder if the 
author is really controlling what he is doing, especially if the documentation 
is incomplete/sparse and we lack commonly accepted terminology graphical 
notation for describing MC algorithms. There are also many mathematically 
oriented articles and textbooks on the MC methods which in my opinion seem 
to have very little connection with the practical every day work of someone 
constructing MC program. The aim of this report is to fill at least partly 
this gap. This report is extension of a section in ref. |]l|. I would like also to 
recommend the classical report [0| of James on the elementary MC methods. 
Section 1 describes elementary MC methods of the single and multiple 
level rejection (reweighting) , including detailed description of the weight book- 
keeping and recipes for keeping correct normalisation for the total integrand 
and the differential distributions (histograms). Section 2 introduces branching 
(multi-channel) method and section 3 demonstrates the simplest combinations 
of the rejection and branching. Section 4 and 5 reviews more advanced as- 
pects of combining rejection and branching, in particular I show examples of 
"equivalent" algorithms, i.e. different algorithm which provide the same distri- 
butions, pointing out advantages of certain arrangements of the the rejection 
and branching. Another common method of the variable mapping is discussed 
in section 5, again in the context of various arrangement of the rejection and 
branching methods. 



2 Rejection, compensating weights 

We intend to generate randomly events, that is points x = {xi,X2,X3, ...,x„), 
according to a distribution 

within certain domain fl and, simultaneously, we want to calculate (estimate) 
the integral 

a= [p{xi)dx''. (2) 



as precisely as possible. In our notation, change of integration variables induces 
in p-density a Jacobian factoiQ 



piVi 



dx 



p{xi). (3) 



dy 
The normalised to unity probability density is simply 

dy = -d^'a, [ dy= [ ^ dx"" = 1. (4) 

a J J dx"^ 

In fig. |l](a) I show the simple single-line algorithm with rejection according 
to a weigh^ defined as a ratio of an exact distribution p to approximate p^^^ 

We assume that we are able to generate randomly events according to p^^^ and 
we know the numerical value of the integral 



a' 



(1) = / p(xif) dx"". (6) 



^ I assume that the reader is famihar only with the elementary calculus methods, and 
purposely avoid to call p a measure. 

^ I assume that the distribution p may be quite singular, in particular it may include 
Dirac S functions. The weight I expect to be analytical almost everywhere. It may, however, 
include discontinuities. 
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Figure 1: Single-line MC algorithm: versions (a) constant-weight events and (b) 
variable-weight events. 



The box 



P 



(i)i 



Xi 



represents part of the algorithm (part of computer code) 
which provides us events according to p*^^-* and the value of a^^^ . The content 
of the box p^^^ (xi) can be a complicated algorithm (computer code) and we 
treat it as a "black box", i.e. we may know nothing about its content. In 
particular it can be taken from ready-to-use library of the programs generating 
standard distributions, or even a physical process providing "random events". 
The circle with the return line depicts rejection method. For each event leaving 

we calculate weight w and we accept event (downward arrow) if 



i(i), 



Xi 



rW < w. 



(7) 



where W is a maximum weight and r is a uniform random number < r < 1. 
Otherwise event is rejected (return arrow in the graph). It is easy to see that 
events exiting our algorithm are generated according to density p. Probability 
density of accepted events at the point Xj is equal to product of probability 
d^pW of events produced in the box p*-^-* times the probability ^accept = w{x)/W 



of accepting an event 

dy{x) = Udy^^^ paccept = M ^(1) ' -^ , (8) 

where M is normalisation factor. Substituting the definition of the weight 
w{x) = d"'a/d"'a^^^ and imposing normahsation condition J d"'p{x) = 1 we 
find 

and as a result we obtain 

d'^pix) = ^ (10) 

a 

as desired. The dashed box p{xi) can be used as part in a bigger algorithm (box 
in a bigger graph) because it provides events generated according to density 
p{xi). The question is whether within the dashed box we are able to estimate 
the integral a. In fact we can, and there are even two ways to do it. In the 
first method we use the ratio on accepted events A^ to the total number N^^' 



of the events generated in the box p^^\xi) . The number of accepted events 



is, on the average, proportional to probability of generating an event event 
c^"o-{i)/o-(i) times probability of accepting an event 

averaged all over the points Xi in the entire integration (generation) domain Q 

l^^ = iV«^, ^(i) = Wa^'\ (12) 

The above relation can be used to calculate the unknown integral a as follows 

using known a^^^ and counting accepted events A^. Of course, the error of the 
above estimator of a is given by the usual statistical error from the binomial 



distribution. In the second method we calculate the average weight where the 
averaging is done over all accepted and rejected events 

<w>=j^^ w{x) = ^. (14) 

n 

The above gives us second equivalent estimator of the unknown integral a in 
terms of the known a^^^ and the measured average weight 

a = a^^^ < w >= a^^^ <w>. (15) 

Another often asked question is: how to calculate the integral Aa over a 
suhdomain Afi, which is for instance a single bin in a histogram? The following 
formula can be easily derived 

where AA^ is number of events falling into subdomain Afi. A particular case 
is the proper normalisation of the histogram. Let us take the one dimensional 
distribution 

^ = J ra S{z - z{x,)) (17) 

n 

which we estimate/calculate by means of collecting generated events in a his- 
togram with rib equal bins within a (^min, -^max) range. The relevant formula 
reads 

da a AN riboAN nta^^'^AN ,^^, 



dz AzN (2;max - Z^injN (2;max " ^min)A^(^) 

In fig. |l](b) I show the same algorithm for variable-weight events. In this 
case we do not reject events but we associate the weight w with each event. 
For the total integral over entire ^2 I may use the same formula of eq. (^) as 
for the constant- weight algorithm. In the case of the histogram we accumulate 
in each bin a sum of weights Yl ^- The properly normalised distribution is 

zSibin 

obtained as follow^ 



da riha^^^ y--^ n^a^^^ 

UZ l^^^max ZYiiiiiJ-ly l^^max ^min/-'"' 



(T) 5Z ^ = r..„.„-l..w(i) T. ^ (19) 
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Figure 2: Single-line MC algorithm: versions (a) with constant-weight events, nested 
rejection loops, and (b) with variable-weight events. 



In fig. §(a) I show the simple single-hne algorithm with several nested 
rejection loops. The meaning of the graph is rather obvious. The original 
distribution po goes through n-step simplification procedure 
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(2). 
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(n) 



(20) 



•^Operationally this formula is identical for the case of variable- weight and constant- 
weight events and is therefore handy in practical calculations Q. The values of a'^^^ and 
N^^' can be accumulated using a dedicated, one-bin-histogram. This arrangement facilitates 
dumping all MC results on the disk and restarting MC run at the later time. 



and the compensation weights 

are used for rejections "locally" in a standard way: each weight w'^^'^ is com- 
pared with rW'^^'^ where < r < 1 is uniform random number, and if w'^^'^ < 
riy^'^) the event is accepted (down- ward arrow), otherwise rejected (return 
loop). The average weights < w'-'^-* > are calculated for each rejection loop. 

The most inward box 



pW(x, 



represents generation algorithm of the points Xi 

according to maximally simplified (crude) distribution p*^"^ for which we know 
the integral cr^"^ = j p*^"-* analytically. The integral of the original distribution 
p is obtained from the crude integral and the average weights 

/n n 

p{xi) rfx" = a(") W < w^'-^^ >= a(") Y[ < w'^''^^ > (22) 

The above is completely standard and can be found in ref. [|I|. Note also that 
all n rejection loops may be combined into single rejection loop with the weight 
being product of all weights along the line 



w = 

i=l 



Y[w^'-^\ (23) 



Usually, the version with nested loops is more efficient and the corresponding 
program is more modular. The weights for the internal loops are related to 
more technical aspects of the MC algorithm (Jacobians) and do not evolve 
quickly (during the development of the program) while external weights cor- 
respond to physics model and may change more frequently. It is therefore 
profitable to keep in practice several levels of the weights. 

Finally, we may decide to perform calculation for weighted events. In 
fig. |](b) I show the version of the simple single- line MC algorithm with variable- 
weight events. The event at the exit of the graph gets associated weight w 
which is the product of all weights along the line. 

3 Branching 

In fig. ^ I show the general MC algorithm with branching into n branches. 
This kind of algorithm is used when the distribution to be generated p can be 




Figure 3: Pure branching. 



split into sum of several distinct (positive) subdistributions 



Pi^i) = Yl 



Pi 



(24) 



i=l 



and we are able, one way or another, to generate each pi separately. Usu- 
ally each Pi contains single peak or one class of peaks in the distribution p. 
In the beginning of the algorithm (black disc) we pick randomly one branch 
(subdistribution) according to probability 



Pk. 



I Pk 

EIp. 



Cik 



a 



(25) 



i.e. we have to know in advance the integrals ai = j pi analytically or numer- 
ically. Typically, in each branch one uses different integration variables Xi to 
parameterise the integral. The particular choice of variables will be adjusted 
to leading "singularities" in the branch distribution. 

Let us give a formal proof of the correctness of the branching method. 



dy{x) = Y,Pkdyk{x) = Y^ 



o-fc d"-akix) 



a 



o-fc 



-^d^^akix) 



d^'aix) 



a 



(26) 



Finally, note that at the exit of the branched MC algorithm (exit of the 
graph in fig. ^), we may be forced for various reason (saving computer mem- 
ory), trash all information on the origin of an event, consequently, we may be 
not able to use any information specific to the branch from which the event 
has came. This is rather important practical aspect to be kept in mind. 

4 Branching and internal rejection 



(a) 



(b) 





Figure 4: Branching and weights. Two equivalent generation methods with individual 
compensation weight for each branch. The case (a) with local return loop for each 
branch, (b) with common return point before branching. See text for more explanations 
how to to transform algorithm (a) into (b). 

In fig. ^(a) I show the simplest combination of the branching algorithm with 
the standard rejection method. This type of the algorithm is potentially very 
efficient but is used not so often in the MC event generators because it requires 
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that we know in advance branching probabihties Pk- In most of situations we 
do not know them analytically. In principle, they can be calculated numerically 
using (Tfc = 0-^, < Wk > but this is not very handy because it requires two MC 
runs - first run which determines average weights < w^ > an second run with 
Pk calculated from < Wk > from the first run. 

The solution to the above problem is the variant of the algorithm presented 
in fig. ^(b) where 

and all rejection returns are done to a point before the branching point. Let 
us check correctness of the algorithm in fig. |5(b). The probability den- 
sity d"'p{xi) at the point Xi at the exit of the algorithm (graph) is propor- 
tional to product of probability of getting event in the box 

d^Pk (x) = d'^al {x)/a\. times probability of accepting event being Wk{x) = 
d^ak{x)/d^a\^ (x), all that averaged over branches with probabilities Pk- The 
same statement is expressed mathematically as follows: 



(1)^ 
Pk KXi 



equal 



d>(x,)=ArVPfcrf>i') 



k 



[X) Wk[X) 



^^^^ d''al^\x) d''ak{x) _ ,,rf"a(x) ^^^ 






a):> d-al'>ix) 

Normalisation J\f = a^^^/a is determined from the condition j^d^p{xi) = 1. 
Finally we obtain d"-p{xi) = d"'a{xi)/a, as expected. 

The total integral is a sum over integrals from all branches a = ^^ ak 
where for each branch we may use the formula Ck = Uk < Wk > (see eq. (|15|)). 
Slightly rearranged formula 



a 



J2 (^k^ < Wk >= a^^^ ^ Pfc < uifc >= a(^) <w>, (29) 



where in < w > we average also over branches, is a straightforward general- 
isation of eq. (|T3|). We can also generalised formula of eq. (|T^ for the total 
integral based on the number of accepted events 

MjL--^i^).^kNk 



- = -''^^ = -''^^^ (30) 

A; 



ATd. j.^^(i) 
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Proof, number A^ of events accepted in all branches is 

^ = E^^ = E^i'^^ (31) 

k k ^k 

where A^^ is total number of events in a given branch (that is before rejection), 
see also eq. (|T^) . Inserting A^^ = N^^^Pk we get A^ = a ja'^^^ and therefore 
eq. (§g). 

Summarising, we see that the two algorithms in fig. ^ are equivalent, ie. 
they provide the same distribution of events and the same total integral. The 
algorithm (a) is probably slightly more efficient but also more difficult to realize 
because it requires precise knowledge of the branching probabilities P^. The 
algorithm (b) is usually less efficient but the branching probabilities P^ are 
easier to evaluate because they correspond to simplified distributions p^ . 

Let us now consider the case of variable-weight events for which all return 
loops in fig. ^ are removed and the two cases (a) and (b) are identical. The 
event at the exit of the algorithm carries the weight from one of the branches! 
For the calculation of the total integral we may use the same formulas of 
eqs. ( ^0|) and (|l^) as for the constant-weight method. Let us check whether 
we may proceed as usual for the calculation of the integrals in the subdomain 
AQ being single bin in any kind of the differential (or multi-differential) dis- 
tribution. Let us generate long series of A^ weighted events and accumulate 
sum of the weights which fall into AQ. Of course, in the sum of accumulated 
weights we have contributions from all branches 

Xi^An '^ An '^ An 

Substituting definitions for PJ, and of d'^pl. we get 

^^^^ ''rf"cr[^^(x) d^'akix) 






r(l) Wn^(l)^ 



a;,GAn k ^ ^^ ^k ""^fc (^) 



N 



5^|rf"a,(x) = ^Aa 



(33) 



'^ An 



Reverting the above formula we get an estimate of the integrated or (multi-) 
differential distribution in terms of sum of the weights 

Aa^ I d-a, = ^ E ^(^^) = W^ ^^"^^^ ^^^^ 

AQ Xi^AQ XiSAn 
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Let us finally discuss the role of the maximum weight Wk and the appar- 
ently unnecessary complication of keeping two kinds of crude distributions a^^^ 
and a^^\ For variable- weight events without branching, W is merely a scale 
factor which cancels out completely among < w > and d"*-^-* in the overall 
normalisation. Its only role is to keep weights in certain preferred range, for 
example it is often preferred to have weights of order 1. In the case of the 
variable-weights with branching the relative values of Wk start to play certain 
role. Although, for infinite number of events, final results (distributions and 
integrals) do not depend on Wk the efficiency (convergence) of the calculation 
depends on the relative ratios of Wk [Q • The maximum weights Wk are more 
important/useful for constant-weight algorithm. They are chosen in such a 
way that w < 1. The rejection method does not work if this condition is not 
fulfilled. In most cases we do not know analytically the maximum weight for 
a given approximate p*-^-* and the maximum weight W is adjusted empirically. 
Of course, the same adjustments can be done by scaling (multiplying by a 
constant) the entire p^^^ but the long-standing tradition tells us to keep p^^^ 
unchanged and rather introduce an explicit adjustment factor W. In the case 
of the constant- weight algorithm the values of Wk determine the efficiency (re- 
jection rate) in each branch. Let us stress again that it is always possible to 
enforce Wk = 1 and the presence of Wk is in fact pure conventional. 

5 Branching and external rejection 

In fig. ^ we transform our algorithm one step further. In fig. ^(a) we repeat 
essentially the algorithm of fig. ^(b) while in fig. |^(b) we have single rejection 
outside branched part. The weights Wk and w are related such that two algo- 
rithms are equivalent, that is both algorithms provide the same distributions 
and calculate the same integral. The relations is very simple 



w 



$:4^)(.) wkix) = E ^ M^) = iSr = T^y-- (35) 



k P^^Hx) "^^"^ P«(x) Epi^^(-) 



k 



The algorithm of fig. P(b) can be also obtained independently by combining 
in a straightforward way the rejection and branching methods. We proceed as 
follows: first we simplify 

p{x) ^ p^^\x) (36) 
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Figure 5: Branching and weights. Two equivalent generation methods: (a) with 
compensation weight for each branch, (b) with common compensation weight for all 
branches, See text for more explanations how to to transform algorithm (a) into (b). 



and this simplificatioii is compensated by the weight 

p{x) 



w 



P 



Wi 



X) 



(37) 



and for the internal part 
in section & Consequent 



p(i)(x) 



we apply the branching method as described 
y, we may apply all standard formulas for the calcu- 
lation of the total integral, for instance a = a^^^ < w >, and we do not need to 
worry about additional proofs of the correctness of the algorithm of fig. ^(b); 
we already know that it generates properly the distribution p(xj). 

Note that the algorithm of fig. ^(b) looks more general than that of fig. ^(a) 
in the following sense: the simplified distribution p*^^^ can be written a sum 
from contributions from all branches p^^-* = X] Pfc ^^^ ^he same is true for p 
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in the case (a) while, in general, it needs not be true in the case (b). In other 
words algorithm (a) can be transformed into (b) but the transformation in the 
opposite direction is less obvious. There is always a trivial transformation of 
(b) into (a) in which we set Wk = w. In other words, if in the graph (a) all 
weights Wk are the same then we are allowed to contract all rejection loop into 
a single one as in graph (b), and vice versa. This sounds trivial but may be 
useful in the case of the several levels of the compensation/rejection weights. 
In the case of the variable-weight we simply omit the rejection return-loops 
and sum up weights of the events. Again, since fig. ^(b) is a direct superposition 
of the standard weighting and branching methods all standard rules apply. It 
is amusing to observe that in spite of the fact that the the weights in the two 
algorithm of figs. |^ are different, the two algorithm provide exactly the same 
distributions and integrals - only efficiency may differ. Which one is more 
convenient or efficient depends on the details of a particular problem. In the 
next section we shall elaborate on advantages and disadvantages of the two. 



6 Branching, compensating weights and map- 
ping 

Branching is a very powerful tool in the case of the distribution with many 
peaks. Usually, we are able to split 



p(^) = Yl 



Pk[X) 



(38) 



in such a way that each pk{x) contains one kind of a spike in the distribution. 
In each branch we generate different spike with help of the dedicated change 
of the variables Xi -^ yl such that in 



Pk{xi) = Pk{yf^) 



dx 



dyC'^ 



(39) 



new distribution pki^y^ ) is completely flat and the whole spike is located in 
the Jacobian function \dy^''ydx\. In the following approximation 



pk{x^) ^ PkiXi) =r^ 



(1), 



,(1) 



dx 
dy^^^ 



(40) 
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the pkiVi ) is simply replaced by the constant residue r\. . The relevant com- 
pensating weight reads as follows 



Pk[,Xi) Pk[Xi 

Wk 



Qy{k) 



dx 



(41) 



(1)/ \ (1) 

The approximate cross sections needed for branching probabilities are 

^^^ = I d-xp^^^ = r^^ I ry^'^ = r« V{nk) (42) 



where ilk is the integration domain expressed in the variables y^''^ and V{rik) 
is simply the Cartesian volume of the domain ^2^ in the y-space. 

Now comes the interesting question: Which of the two algorithms of fig. § 
is more convenient and/or efficient. Finally, the answer will always depend on 
the individual properties of a given distribution p. Nevertheless, let us point 
out some general advantages of the case (a). In the algorithm of fig. ^(a) we 
need a single weight Wk, for the k-th branch from which an event originates. 
The distribution p^ might be a simple function directly expressed in terms 
of Xi but it may happen that the Jacobian Idy^'^ydx] is a more complicated 

(k) 

function which requires the knowledge of y^ and the whole transformation 

(k) 

y\ ) ^ ^._ Qf course, it is not the problem for the single branch, as in the case 
(a), since in the process of generating an event we calculate primarily (generate 
randomly) the point yl and we transform it into Xi; the calculation of this 
Jacobian is usually a byproduct of the generation process. The situation in 
the algorithm of fig. |^(b) might be worse because in this case the global weight 



p{x 

w — 



J2pk{x) Y.pk{x) 

k k 



k k 



dyW 
dx 



(43) 



contains in the denominator p^. (x) (or Jacobians) for all branches. Conse- 
quently, in some cases we may be forced to perform for each event, (often 
quite complicated) transformations Xi -^ y\ and calculate Jacobians for all 
branches. This is cumbersome, especially if we have large number of branches. 
It may also consume a lot of computer time. Just imagine that due to per- 
mutation symmetry we have A^! branches - even if A^ is some moderately 
high number the summation over all branches might consume almost infinite 
amount of computer time. The procedure of summation over branches might 
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be also numerically instable in the case of very strong spikes in p^ (x) because 
computer arithmetic is usually optimised for a given single spike in a given 
branch and it might break down for other branch, unless special labour-hungry 
methods are employed. 

We conclude that the algorithm in fig. |^(a) seems to have certain advantages 
over the algorithm in fig. |^(b) although in many cases the difference might be 
unimportant and one might find algorithm (b) more simple (it is perhaps easier 
to explain and document). 




Figure 6: Branching and weights. Practical example. 

In the two examples of fig. ^ we have required that p{x) can be split im- 
mediately into a sum of singular terms, each of them generated in a separate 
branch. In the real life it is often not true and fig. illustrates more realistic 
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scenario. Here, before branching can be applied, we make simplification 

p{xi) -^ p^^\xi) (44) 



compensated by the weight 



w 



(0) 



p{xi) 



p{Xi) 



9(1)1 



Xi 



Ep': 



(1), 



(45) 



Jji 



This simplification removes fine details from p{x) (for example quantum me- 
chanical interferences) which are numerically unimportant and prevent us from 
writing p{x) as a sum of several positive and relatively simple singular terms. 
(Note that in w^^' we still do not have any Jacobians and we know nothing 
about transformation to variables y^ !) The branching is done in the next 
step for p^^\xi) and the weights 



w 



(1) 



P?\x^) 



(46) 



compensate for the fact that the Jacobian \dy^^ydx\ do not match exactly 
the p\. (xj), see discussion above. As in the example of fig. ||, wl involves 
elements of the calculation for a single branch only (Jacobian!). The branching 
probabilities P^ are easily calculated using known integrals cr^' 
The total weight 



^(2) 



w = W^ 'w 



{o).„(i) 



p[Xi) p^k\^i) 


p{xd Pk 


{X^) 
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Ep'k\^^) r« 


dyW 
dx 



(47) 



consists of global component w^ ' which knows nothing about technicalities 
of generation in the individual branch k and the local component w^ which 
bears responsibility for technicalities of generation in the individual branch k 
(in particular it may encapsulate cumbersome Jacobian functions). The lack 
of sum over k in eq. (|47|) is not a mistake - the local part of the weight is 
calculated only for a SINGLE fc-th branch!!! This is a great practical advantage 
and such an arrangement of the weights is generally much more convenient 
contrary to the algorithm being the straightforward extension of the algorithm 
in fig |^(b) for which the weight 
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^(0)^(1) 
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(1), 



Xi 
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(2), 



Xi 



Ep'k 

k 



(1), 
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(1) 
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%(fc) 
dx 



(48) 
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does include sum all over branches for the local part w^^^ of the weight. The 
efficiency of the two algorithms depends on the details of the distribution and 
in order to see which of the above of two algorithms is more efficient has to be 
checked case by case. 

Another important advantage of the algorithm of eq. (^) is that the part of 
the algorithm generating p^ (xi) can be encapsulated into single subprogram 
which generated Xi according to fc-th crude distribution p^. {xi) and provides to 
the outside world the weight wl . The main program does not to need to know 
more about any details of the algorithm encapsulated in the subprogram. The 
rather annoying feature of the algorithm of eq. (|48|) is that for the construction 
of the total weight in the main program we need to know all nuts and bolts of 
the sub-generator for p^ (xj), thus encapsulation cannot be reahzed, leading 
to cumbersome non-modular program. 

Finally let us note that the total integral is calculated with the usual for- 
mula 

^(2) ^,„(1),„(0) ^ ^(2) =\^^(2) 



a = a'^' < w'^' w"-"' >, a 



E*?' («) 



where we understand that for < w^^^ > in eq. (|47|) the average is taken over 
all branches. For variable-weight events the the weight is w = w^^^w^^^ where 
■u;(i) = w, for the actual k-th. branch. 



-'k 



7 Conclusions 

I have described how to combine three elementary Monte Carlo methods re- 
jection, branching and change of variables in the difficult task of generating 
multi-dimensional distributions. I have spend some time giving formal math- 
ematical proofs of these methods, thus providing useful reference for papers 
describing MC event generators, where usually authors lack space/time to 
discuss such proofs. I have also discussed in quite some detail advantages 
and disadvantages various combinations of branching and rejection methods. 
Again, although these aspects may be known to authors of various MC pro- 
grams they are practically never discussed. The most important for practical 
applications is probably the discussion on the advantages and disadvantages 
of the two arrangements of rejection and branching in fig. ^. 
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